Parsing di Corpora di Apprendenti di Italiano: un Primo Studio su VALICO (Parsing Italian Learner Corpora: a Case Study on VALICO)

نویسندگان

Elisa Corino

Claudio Russo

چکیده

English. Modern learner corpora are now routinely PoS tagged, whereas syntactic parsing is much less frequent. This paper proposes a first attempt of parsing applied to a subcorpus of VALICO, in an effort to identify key elements to be further used to parse corpora of Italian as a foreign language in

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Studio sull'Ordine dei Costituenti nel Confronto tra Generi e Complessità (Analysis of Constituents Order Across Textual Genres and Complexity)

Italiano. In questo articolo presentiamo uno studio sull’ordine dei costituenti in italiano basato su corpora annotati in maniera automatica fino all’analisi sintattica a dipendenze. L’indagine comparativa ha permesso di valutare l’influenza sia del genere testuale sia della complessità linguistica nella distribuzione dei fenomeni di marcatezza sintattica. English. In this paper we present a st...

متن کامل

Generalization in Native Language Identification: Learners versus Scientists

English. Native Language Identification (NLI) is the task of recognizing an author’s native language from text in another language. In this paper, we consider three English learner corpora and one new, presumably more difficult, scientific corpus. We find that the scientific corpus is only about as hard to model as a less-controlled learner corpus, but cannot profit as much from corpus combinat...

متن کامل

Building a Social Media Adapted PoS Tagger Using FlexTag -- A Case Study on Italian Tweets

English. We present a detailed description of our submission to the PoSTWITA shared-task for PoS tagging of Italian social media text. We train a model based on FlexTag using only the provided training data and external resources like word clusters and a PoS dictionary which are build from publicly available Italian corpora. We find that this minimal adaptation strategy, which already worked we...

متن کامل

Tree Kernels-based Discriminative Reranker for Italian Constituency Parsers

English. This paper aims at filling the gap between the accuracy of Italian and English constituency parsing: firstly, we adapt the Bllip parser, i.e., the most accurate constituency parser for English, also known as Charniak parser, for Italian and trained it on the Turin University Treebank (TUT). Secondly, we design a parse reranker based on Support Vector Machines using tree kernels, where ...

متن کامل

Dealing with Italian Adjectives in Noun Phrase: a Study Oriented to Natural Language Generation

English. This paper describes a theoretical and empirical investigation about the position of adjectives in the Italian language. The long term goal which oriented the study is the formalization of this information into a natural language generation system. Providing that adjectives mainly occur within noun phrases, we focused on them and we collected data from corpora representing very differe...

متن کامل

ذخیره در منابع من

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

Parsing di Corpora di Apprendenti di Italiano: un Primo Studio su VALICO (Parsing Italian Learner Corpora: a Case Study on VALICO)

نویسندگان

چکیده

منابع مشابه

Studio sull'Ordine dei Costituenti nel Confronto tra Generi e Complessità (Analysis of Constituents Order Across Textual Genres and Complexity)

Generalization in Native Language Identification: Learners versus Scientists

Building a Social Media Adapted PoS Tagger Using FlexTag -- A Case Study on Italian Tweets

Tree Kernels-based Discriminative Reranker for Italian Constituency Parsers

Dealing with Italian Adjectives in Noun Phrase: a Study Oriented to Natural Language Generation

عنوان ژورنال:

اشتراک گذاری